Combined Use of Reinforcement Learning and Simulated Annealing: Algorithms and Applications

نویسنده

  • PÉTER STEFÁN
چکیده

] , [ b a , ] , [ d c real intervals w ∇ partial derivatives of a quantity according to the weight vector elements i A , i B , i C machining times on machine s A, B, and C respectively j i a jth action in the ith state of the agent α learning rate for " mean value type " action-state values b vector of boundary conditions in flow-shop scheduling β learning rate for " standard deviation type " action-state values β " inverse temperature " (Chapter 3) C set of constraints (Chapter 5) C number of precedence constraints c , i c constant parameters (Chapter 3) i c precedence constraints (Chapter 5) c vector of parameters (Chapter 3) ij c cost of delivery between nodes i and j (Chapter 4) γ discount factor D compound matrix i d distance label on node i ij d distance label on node i (matrix representation)) dim(a dimension of vector a δ temporal difference factor } {x E π expected value of x following policy π i E " potential energy ") (s e t eligibility value of state s e the natural number ... 71. 2 ≈ i e base vector in dimension i (Chapter 5) ε small positive error) (x f transfer function) (' x f ,) (" x f reward function) (x g reward function g auxiliary machining-delay vector (Chapter 5)) (' x h ,) (" x h reward function 8 j I Palmer index for job j ij η visibility of a trail from node i to node j j j the jth job under processing k the number of actions sharing the same, maximal action-state value κ minimum of the difference between each action-state value pair) (A L link state description of node A λ decay rate for eligibility traces M matrix of machining times) (i m number of actions in the ith state (Chapter 2) n number of states ν small positive redundancy variable 1 o , 2 o abstract machining times a ss P ' probability of getting into state ' s from state s via action a i p probability of selecting action i a p ˆ scaled probability value) (x p continuous probability distribution function p , r permutation vectors (Chapter 5) i pred predecessor node ij pred predecessor node (matrix representation)) , (j i i a s π …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SIMULATED ANNEALING ALGORITHM FOR SELECTING SUBOPTIMAL CYCLE BASIS OF A GRAPH

The cycle basis of a graph arises in a wide range of engineering problems and has a variety of applications. Minimal and optimal cycle bases reduce the time and memory required for most of such applications. One of the important applications of cycle basis in civil engineering is its use in the force method to frame analysis to generate sparse flexibility matrices, which is needed for optimal a...

متن کامل

On the Relationship between Learning Capability and the Boltzmann-Formula

In this paper a combined use of reinforcement learning and simulated annealing is treated. Most of the simulated annealing methods suggest using heuristic temperature bounds as the basis of annealing. Here a theoretically established approach tailored to reinforcement learning following Softmax action selection policy will be shown. An application example of agent-based routing will also be ill...

متن کامل

Reinforcement Learning in Neural Networks: A Survey

In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...

متن کامل

Reinforcement Learning in Neural Networks: A Survey

In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...

متن کامل

Finding the Shortest Hamiltonian Path for Iranian Cities Using Hybrid Simulated Annealing and Ant Colony Optimization Algorithms

  The traveling salesman problem is a well-known and important combinatorial optimization problem. The goal of this problem is to find the shortest Hamiltonian path that visits each city in a given list exactly once and then returns to the starting city. In this paper, for the first time, the shortest Hamiltonian path is achieved for 1071 Iranian cities. For solving this large-scale problem, tw...

متن کامل

Evolution of Fuzzy Controllers and Applications

The present chapter deals with the issues related to the evolution of optimal fuzzy logic controllers (FLC) by proper tuning of its knowledge base (KB), using different tools, such as least-square techniques, genetic algorithms, backpropagation (steepest descent) algorithm, ant-colony optimization, reinforcement learning, Tabu search, Taguchi method and simulated annealing. The selection of a p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003